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dnd  the  biological  and  physical  sciences. j  Often,  the  heteroscedasticity  is  modeled  as  a 
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asymptotic  theory  implies  that  how  one  xdst  mates  the  variance  function,  in  particular  the 
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parameter  estimates;  however,  there  is  evidence  both  in  practice  and  higher  order  theory  to 

suggest  that  how  one  estimates  the  variance  function  docs  matter.  Further,  in  some  settings 

estimation  of  the  variance  function  is  of  independent  interest  or  plays  an  important  role  ir 

estimation  of  other  quantities.  TrnHhis  paper  study  variance  function  estimation  in  a 

unified  way,  focusing  on  common  methods  proposed  in  the  statistical  and  other  literature, 

in  order  to  make  both  general  observations  and  compare  different  estimation  schemes.  -We-' 

shotf  there  are  significant  differences  in  both  efficiency  and  robustness  for  many  common 
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We  develop  general  theory^for  variance  function  estimation,  focusing  on  estimation 
of  the  structural  parameters  and  including  most  methods  in  common  use  in  our  development. 
The  general  qualitative  conclusions  are  these.  First,  most  variance  function  estimation 
procedures  can  be  looked  upon  as  regressions  with  "responses"~being  transformations  of 
absolute  residuals  from  a  preliminary  fit  or  sample  standar  deviations  from  replicates 
at  a  design  point.  Our  cono-tus ion  is  that^  the  former  is  typically  more  efficient,  but 
not  uniformly  so.  Secondly,  for  variance  function  estimates  based  on  transformations  of 
absolute  residuals,  we  show  that  efficiency  is  a  monotone  function  of  the  efficiency 
of  the  fit  from  which  the  residuals  are  formed,  at  least  for  symmetric  errors.  Our  con¬ 
clusion  is  that  one  should  iterate  so  that  residuals  are  based  on  generalized  least 
squares.  Finally,  robustness  issues  are  of  even  more  importance  here  than  in  estimation 
of  a  regression  function  for  the  mean.  The  loss  of  efficiency  of  the  standard  method 
away  from  the  normal  distribution  is  much  more  rapid  than  in  the  regression  problem.  /ZL — 


As  an  example  of  the  type  of  model  and  estimation  methods  we  consider,  for  observation 
-  covariance  pairs  (Y^,x^),  one  may  model  the  variance  as  proportional  to  a  power  of 

the  mean  response,  e.g., 


E(Y.)  =  f(x.,8);  var(Y.)  =  o  f(x.,8)u,  f(x.,6)  >  0, 


where  f(x.,B)  is  the  possibly  nonlinear  mean  function  and  0  is  the  structural  parameter 
of  interelt.  "Regression  methods"  for  estimation  of  0  and  o  based  on  residuals 


r^  =  Y^  -  f(x^,6*)  for  some  regression  fit  6*  involve  minimizing  a  sum  of  squares  where 
some  function  T  of  the  Ir.l  plays  the  role  of  the  "responses"  and  an  appropriate  functic 


some  function  T  of  the  |r.|  plays  the  role  of  the  "responses"  and  an  appropriate  function 

1  2 
of  the  variance  plays  the  role  of  the  "regression  function."  For  example,  if  T(x)  =  x  , 


the  responses  would  be  r^,  and  the  form  of  the  regression  function  would  be  suggested 


2  2  2  0 

by  the  approximate  fact  E  (r.)  -  o  f (x.,6*)  .  One  could  weight  the  sum  of  squares 

l  1  2 

appropriately  by  considering  the  approximate  variance  of  r^ .  For  the  case  of  replication 


at  each  x^,  some  methods  suggest  replacing  the  r^  in  the  function  T  by  sample  standard 


deviations  at  each  x..  Other  functions  T,  such  as  T(x)  =  x  or  log  x  have  also  been 
proposed.  _ _ 
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ABSTRACT 


Heteroscedastic  regression  models  are  used  in  fields  including 
economics,  engineering,  and  the  biological  and  physical  sciences.  Often, 
the  heteroscedastic! ty  is  modeled  as  a  function  of  the  covariates  or  the 
regression  and  other  structural  parameters.  Standard  asymptotic  theory 
implies  that  how  one  estimates  the  variance  function,  in  particular  the 
structural  parameters,  has  no  effect  on  the  first  order  properties  of  the 
regression  parameter  estimates;  however,  there  is  evidence  both  in  practice 
and  higher  order  theory  to  suggest  that  how  one  estimates  the  variance 
function  does  matter.  Further,  in  some  settings,  estimation  of  the  variance 
function  is  of  independent  interest  or  plays  an  important  role  in  estimation 
of  other  quantities.  In  this  paper,  we  study  variance  function  estimation 
in  a  unified  way,  focusing  on  common  methods  proposed  in  the  statistical  and 
other  literature,  in  order  to  make  both  general  observations  and  compare 
different  estimation  schemes.  We  show  there  are  significant  differences  in 
both  efficiency  and  robustness  for  many  common  methods. 

We  develop  a  general  theory  for  variance  function  estimation,  focusing 
on  estimation  of  the  structural  parameters  and  including  most  methods  in 
common  use  in  our  development.  The  general  qualitative  conclusions  are 
these.  First,  most  variance  function  estimation  procedures  can  be  looked 
upon  as  regressions  with  "responses"  being  transformations  of  absolute 
residuals  from  a  preliminary  fit  or  sample  standard  deviations  from 
replicates  at  a  design  point.  Our  conclusion  is  that  the  former  is 
typically  more  efficient,  but  not  uniformly  so.  Secondly,  for  variance 


f.v 


VM* 


£ 


.  .  .  .  v  /.v«,vv v, v ,y. 


function  estimates  based  on  transformations  of  absolute  residuals,  we  show 
that  efficiency  is  a  monotone  function  of  the  efficiency  of  the  fit  from 
which  the  residuals  are  formed,  at  least  for  symmetric  errors.  Our 
conclusion  is  that  one  should  iterate  so  that  residuals  are  based  on 
generalized  least  squares.  Finally,  robustness  issues  are  of  even  more 
importance  here  than  in  estimation  of  a  regression  function  for  the  mean. 
The  loss  of  efficiency  of  the  standard  method  away  from  the  normal 
distribution  is  much  more  rapid  than  in  the  regression  problem. 

As  as  an  example  of  the  type  of  model  and  estimation  methods  we 
consider,  for  observation  -  covariate  pairs  (Y^.x^).  one  may  model  the 
variance  as  proportional  to  a  power  of  the  mean  response,  e.g. , 

E(Yj)  =  f(xi.P)  ;  var^)  =  a  ffx^p)9.  ffx^P)  >  0. 

where  ((x^.P)  is  the  possibly  nonlinear  mean  function  and  0  is  the 
structural  parameter  of  interest.  "Regression  methods"  for  estimation  of  0 
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and  a  based  on  residuals  r^  =  Y^  -  f(x^,/?M)  for  some  regression  fit  PH 

involve  minimizing  a  sum  of  squares  where  some  function  T  of  the  | r ^ |  plays 

the  role  of  the  "responses"  and  an  appropriate  function  of  the  variance 
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plays  the  role  of  the  "regression  function.”  For  example,  if  T(x)  =  x  ,  the 
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responses  would  be  r^,  and  the  form  of  the  regression  function  would  be 
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suggested  by  the  approximate  fact  E  (r*)  X  a  f(x  ,8  )  One  could  weight 

the  sum  of  squares  appropriately  by  considering  the  approximate  variance  of 
2 

r^.  For  the  case  of  replication  at  each  x^,  some  methods  suggest  replacing 
the  Tj  in  the  function  T  by  sample  standard  deviations  at  each  x^ .  Other 
functions  T,  such  as  T(x)  s  x  or  log  x  have  also  been  proposed. 


Consider  a  heteroscedastic  regression  model  for  observable  data  Y: 


(1.1) 


EYi  =  nt  =  f(x,./3):  Var  (Y< )  =  aV(z4.P.0) 


Here,  {x^}  are  the  design  vectors.  p(p  x  1)  is  the  regression  parameter,  f  is 


the  mean  response  function,  and  the  variance  function  g  expresses  the 


heteroscedastic! ty.  where  {z^}  are  known  vectors,  possibly  the  {x^},  a  is  an 
unknown  scale  parameter,  and  0(r  x  1)  is  an  unknown  parameter.  For  example. 


the  variance  may  be  modeled  as  proportional  to  a  power  of  the  mean: 


(1.2) 


g{zi.p.e)  =  f (Xj ,p)  >  o. 


One  might  also  model  the  variance  as  quadratic  in  the  predictors,  i.e.. 


a  g(z1,p,0)  =  1  +  01xi  +  02xi# 


or  by  an  expanded  power  of  the  mean  model,  i.e.. 


(1.3) 


o' 2  g 2(zi.p.e)  =  0J  +  e2  ((xi,p)  3. 


Box  and  Meyer  (1986)  use 


g(z j.P.0)  =  exp(zj0). 


An  important  feature  of  (l.l)  is  that  no  assumption  about  the  distribution  of 
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the  (Y^)  has  been  made  other  than  that  of  the  form  of  the  first  two  moments. 
Models  which  may  be  regarded  as  special  cases  of  (1.1)  are  used  in  diverse 
fields,  including  radioimmunoassay,  econometrics,  pharmokinetic  modeling, 
enzyme  kinetics  and  chemical  kinetics  among  others.  The  usual  emphasis  is  on 
estimation  of  j3  with  estimation  of  the  variances  as  an  adjunct. 

The  most  common  method  for  estimating  P  is  generalized  least  squares,  in 
which  one  estimates  g(z ^,0,0)  by  using  an  estimate  of  0  and  a  preliminary 
estimate  of  P  and  then  performs  weighted  least  squares;  see,  for  example, 
Carroll  and  Ruppert  (1982a)  and  Box  and  Hill  (1974).  This  might  be  iterated, 
with  the  preliminary  estimate  replaced  by  the  current  estimate  of  p,  a  new 
estimate  of  0  obtained  and  the  process  repeated.  Standard  asymptotic  theory  as 
in  Carroll  and  Ruppert  (1982a)  or  Jobson  and  Fuller  (1980)  shows  that  as  long 
as  the  preliminary  estimators  for  the  parameters  of  the  variance  function  are 
consistent,  all  estimators  of  P  obtained  in  this  way  will  be  asymptotically 
equivalent  to  the  weighted  least  squares  estimator  with  known  weights. 

There  is  evidence  that  for  finite  samples,  the  better  one’s  estimate  of  0. 
the  better  one’s  final  estimate  of  j3.  Williams  (1975)  states  that  "both 
analytic  and  empirical  studies. .. Indicate  that. ..the  ordering  of  efficiency  (of 
estimates  of  /?)... in  small  samples  is  in  accordance  with  the  ordering  by 
efficiency  (of  estimates  of  0).’’  Rothenberg  (1984)  shows  via  second  order 
calculations  that  if  g  does  not  depend  on  P,  when  the  data  are  normally 
distributed  the  covariance  matrix  of  the  generalized  least  squares  estimator  of 
P  is  an  Increasing  function  of  the  covariance  matrix  of  the  estimator  of  0. 

Second  order  asymptotics  provide  only  a  weak  Justification  for  studying 
the  properties  of  variance  function  estimates.  Instead,  our  thesis  is  that 
estimation  of  the  structural  variance  parameter  0  is  of  independent  interest. 
In  many  engineering  applications,  an  important  goal  is  to  estimate  the  error 
made  in  predicting  a  new  observation;  this  can  be  obtained  from  the  variance 


function  once  a  suita  le  estimate  of  0  Is  available.  In  chemical  and 
biological  assay  problems,  issues  of  prediction  and  calibration  arise.  In  such 
problems,  the  estimator  of  0  plays  a  central  role.  As  motivation  for  the  study 
of  variance  function  estimation,  in  Section  2  we  discuss  the  problems  of 
calibration  and  prediction  in  the  case  of  heteroscedasticity.  For  a  formal 
investigation  of  how  the  statistical  properties  of  prediction  intervals  and 
calibration  constructs  such  as  the  minimal  detectable  concentration  will  be 
highly  dependent  on  how  one  estimates  0;  see  Carroll,  Davidian  and  Smith 
(1986).  In  off-line  quality  control,  the  emphasis  is  not  only  on  the  mean 
response  but  also  on  its  variability:  Box  and  Meyer  (1986)  state  that  "one 
distinctive  feature  of  Japanese  quality  control  improvement  techniques  is  the 
use  of  statistical  experimental  design  to  study  the  effect  of  a  number  of 
factors  on  variance  as  well  as  the  mean.”  The  goal  is  to  adjust  the  levels  of 
a  set  of  experimental  factors  to  bring  the  mean  of  the  responses  to  some  target 
value  while  minimizing  standard  deviation;  the  problem  involves  simultaneous 
consideration  of  both  mean  and  variability,  where  the  latter  may  be  a  function 
of  the  mean,  see  Box  (1986)  and  Box  and  Ramirez  (1986).  These  authors  advocate 
methods  based  on  data  transformations  to  account  for  the  heteroscedasticity  in 
separating  the  factors  into  those  affecting  dispersion  but  not  location,  those 
affecting  location  but  not  dispersion,  and  those  affecting  neither.  One 
similarly  might  employ  effective  estimation  of  variance  functions  in  this 
application.  We  briefly  discuss  the  relationship  between  variance  function 
estimation  and  one  type  of  data  transformation  in  Section  3. 

It  should  be  evident  from  this  brief  review  that  far  from  being  only  a 
nuisance  parameter,  the  structural  variance  parameter  0  earn  be  an  important 
part  of  a  statistical  analysis.  The  above  discussion  suggests  the  need  for  a 
unified  investigation  of  estimation  of  variance  functions,  in  particular, 
estimation  of  the  structural  parameter  0.  Previous  work  in  the  literature 


4 

tends  to  treat  various  special  cases  of  (1.1)  as  different  models  with  their 
own  estimation  methods.  The  intent  of  this  paper  is  to  study  parametric 
variance  function  estimation  in  a  unified  way.  Nonparametric  variance  function 
estimation  has  also  been  studied,  see  for  example  Carroll  (1982);  we  will 
confine  our  study  to  the  parametric  setting. 

Parametric  variance  function  estimation  may  be  thought  of  as  a  type  of 
regression  problem  in  which  we  try  to  understand  variance  as  a  function  of 
known  or  estimable  quantities,  and  in  which  6  plays  the  part  of  a  "regression" 
parameter.  The  major  insight  which  allows  for  a  unified  study  is  that  the 
absolute  residuals  from  the  current  fit  to  the  mean  or  the  sample  standard 
deviations  from  replicates  are  basic  building  blocks  for  analysis.  At  the 
graphical  level,  this  means  that  transformations  of  the  absolute  residuals  and 
sample  standard  deviations  can  be  used  to  gain  insight  into  the  structure  of 
the  variability  and  to  suggest  parametric  models.  For  estimation,  a  major 
contribution  is  to  point  out  that  most  of  the  methods  proposed  in  the 
literature  are  (possibly  weighted)  regressions  of  transformations  of  the  basic 
building  blocks  on  their  expected  values.  Many  exceptions  to  this  are  dealt 
with  in  this  article  as  well. 

Our  study  yields  these  major  qualitative  conclusions.  As  stated  here, 
they  apply  strictly  only  to  symmetric  error  distributions,  but  they  are  fairly 
definitive,  and  one  is  unlikely  to  be  too  successful  ignoring  them  in  practice. 

(I).  Robustness  plays  a  great  role  in  the  efficiency  of  variance  function 
estimation,  probably  even  greater  than  in  estimation  of  a  mean  function.  For 
example,  if  the  variance  does  not  depend  on  the  mean  response,  the  standard 
method  will  be  normal  theory  maximum  likelihood  as  in  Box  &  Meyer  (1986).  A 
weighted  analysis  of  absolute  residuals  yields  an  estimator  only  12X  less 
efficient  at  the  normal  model  which  rapidly  gains  efficiency  over  maximum 
likelihood  for  progressively  heavier-tailed  distributions.  This  slope  of 


Improvement  is  much  larger  than  is  typical  in  regression  on  means.  For  a 
standard  contaminated  normal  model  for  which  the  best  robust  estimators  have 
efficiency  125%  with  respect  to  least  squares,  the  absolute  residual  estimator 
of  the  variance  function  has  efficiency  200%. 

(II) .  We  obtain  implications  for  fit  to  the  means  upon  which  the 
residuals  are  based.  It  has  been  our  experience  that  unweighted  least  squares 
residuals  yield  unstable  estimates  of  the  variance  function  when  the  variances 
depend  on  the  mean.  This  is  confirmed  in  our  study,  in  the  sense  that  the 
asymptotic  efficiency  of  the  variance  function  estimators  is  an  increasing 
function  of  the  efficiency  of  the  current  fit  to  the  means.  Thus,  we  suggest 
the  use  of  iterative  weighted  fitting,  so  that  the  variance  function  estimate 
is  based  on  generalized  least  squares  residuals.  As  far  as  we  can  tell,  this 
part  of  our  paper  is  one  of  the  first  formal  justifications  for  iteration  in  a 
generalized  least  squares  context. 

(III) .  It  is  standard  in  many  applied  fields  to  take  m  replicates  at  each 
design  point,  where  usually  m  £  4.  Rather  than  using  (transformations  of) 
absolute  residuals  for  estimating  variance  function  parameters,  one  might  use 
the  sample  standard  deviations.  We  develop  an  asymptotic  theory  from  which  we 
obtain  the  efficiency  of  this  substitution.  The  effect  is  typically,  although 
not  always,  a  loss  of  efficiency,  at  least  when  there  are  m  i  4  replicates. 
The  clearest  results  occur  when  the  variance  does  not  depend  on  the  mean. 
Normal  theory  maximum  likelihood  is  a  weighted  regression  of  squared  residuals; 
the  corresponding  method  would  be  a  weighted  regression  based  on  sample 
variances.  Using  the  latter  entails  a  loss  of  efficiency,  no  matter  what  the 
underlying  distribution.  For  normally  distributed  data,  the  efficiency  is 
(m-l)/m,  thus  being  only  50%  for  duplicates.  For  other  methods,  using  the 
replicate  standard  deviations  can  be  more  efficient.  This  is  particularly  true 
of  a  method  due  to  Harvey  (1976),  which  is  based  on  the  logarithm  of  absolute 


residuals.  A  small  absolute  residual,  which  seems  to  always  occur  in  practice, 
can  wreak  havoc  with  this  method.  This  is  consistent  with  our  influence 


function  calculations,  so  that  we  suggest  some  trimming  of  the  smallest 
absolute  residuals  before  applying  Harvey's  method. 

(IV).  Our  results  indicate  that  the  precision  of  estimates  0  is 
approximately  independent  of  a.  Also,  in  the  power  of  the  mean  model  (1.2), 
the  efficiency  of  a  regression  estimator  improves  as  the  relative  range  of 
values  of  the  mean  response  increases;  efficiency  depends  on  the  spread  of  the 
logarithms  of  means,  not  their  actual  values.  This  helps  explain  why  in 
assays,  estimating  variances  is  typically  much  harder  than  estimating  means. 

In  Section  2  we  discuss  the  prediction  and  calibration  problems  as  a 
motivating  example  of  a  situation  in  which  variance  function  estimation  is  of 
key  importance.  In  Section  3  we  describe  a  number  of  methods  for  estimation  of 
0.  We  do  not  discuss  robust  methods,  see  Giltinan,  Carroll  and  Ruppert  (1986). 
In  Section  4  we  present  an  asymptotic  theory  for  a  general  estimator  of  0  whose 
construction  encompasses  the  methods  of  Section  3.  Section  5  contains  examples 
of  specific  applications  of  our  theory  and  a  discussion  of  the  implications  of 
our  formulation.  Sketches  of  proofs  are  presented  in  Appendix  A. 


2.  AN  EXAMPLE:  THE  ROLE  OF  VARIANCE  ESTIMATION  IN  PREDICTION  AND 
CALIBRATION  PROBLEMS 

One  example  in  which  heterogeneity  of  variation  occurs  is  in  calibration 
experiments  in  the  physical  and  biological  sciences,  in  which  one  fits  a  model 
such  as  (1.1)  to  a  sample  (y^x^),  i  =  1.....N.  The  {x^}  may  be  concentrations 
of  a  substance  and  the  {y^}  counts  or  intensity  levels  which  vary  with 
concentration.  The  interest  lies  in  using  the  estimated  regression  to  make 


inference  about  a  pair  {yo-x0}  which  is  independent  of  the  original  data  set. 
One  may  wish  to  obtain  point  and  interval  predictors  for  yQ  in  the  case  xq  is 
known  (prediction)  or  estimate  xq  in  the  case  yQ  only  is  known  (calibration), 
see  Rosenblatt  and  Speigelman  (1981).  As  a  motivating  example  for  considering 
estimation  of  variance  functions  as  an  independent  problem,  we  describe  the 
primary  role  of  form  and  estimation  of  the  variance  function  in  construction  of 
prediction/calibration  intervals  in  the  case  of  heteroscedasticity. 

Throughout  this  discussion  assume  Xj  =  z so  that  we  may  write  the 
variance  function  as  g(Xj,0,0),  and  assume  that  the  data  are  approximately 
normally  distributed.  Given  xq,  the  standard  point  estimate  of  the  response  yQ 

A  A  /V 

is  f(xo>/3),  where  0  is  some  estimate  for  p.  For  any  consistent  estimator  P  of 

P.  under  (1.1)  the  variance  in  the  error  made  by  the  prediction  is,  for  large 

a  2  2 

sample  sizes,  var{yQ  -  f(xQ,0)}  ~  a  g  (x  ,0,0),  so  that  the  error  in 
prediction  is  determined  mainly  by  the  variance  function  a  g  (xq,0,0)  and  not 
the  original  data  set  itself.  An  approximate  (l-a)100%  confidence  interval  for 
yQ  is  I(xq)  =  {  all  y  in  the  interval  f (x^.0)  ±  tj_^2  a  g(xo*0-®)  )•  here 
t^_^2  *s  the  (l-a/2)  percentage  point  of  the  t  distribution  with  (N-p)  degrees 

/V  A 

of  freedom,  and  a  and  0  are  estimates.  If  the  parameters  are  estimated  by  a 
weighted  analysis  such  as  generalized  least  squares  assuming  (1.1),  all 
estimates  are  consistent  and  the  prediction  interval  becomes 

(2.1)  I(xQ)  £  {  all  y  in  the  interval  f(xQ,0)  ±  ^ ^  a  g (xq.0.0)  }. 


If  one  were  to  ignore  the  heterogeneity,  the  interval  would  be  given  by  I^(xq) 

=  {  all  y  in  the  interval  f(xQ,0)  ±  tj-a/2  °  ^  ‘  ^ or  501  unweighted  analysis, 

2  ~2  2 
however,  a  would  be  estimated  by  the  unweighted  mean  squared  error  ~  a 

-12  2  2 

N  2  g  (x^,0,0)  =  a  gjy  for  large  N.  Thus,  the  unweighted  prediction  interval 


satisf ies 


UNWEIGHTED  =  DASHED,  WEIGHTED  =  SOLID 


COUNT 


Figure  2. 
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(2.2)  I^x^  ~  ^  al*  y  ln  the  interval  f(xo*0)  ±  tl-^/2  °  % 

Comparing  (2.1)  and  (2.2).  we  see  that  where  the  variability  is  small,  the 
unweighted  interval  will  be  too  long  and  hence  pessimistic,  and  conversely 
where  the  variance  is  large.  Figure  1  illustrates  a  typical  case  of  this 
phenomenon  for  the  simple  situation  of  an  approximately  linear  mean  response 
function  where  variability  increases  with  mean  response. 

The  situation  is  the  same  for  calibration.  For  simplicity  in  discussing 
calibration,  assume  f(x,0)  is  strictly  Increasing  or  decreasing  in  x.  Given 

A 

yQ,  the  usual  estimate  of  Xq  is  that  value  satisfying  yQ  =  f(x,/3).  The  common 
confidence  interval  for  x  is  the  set  of  all  x  values  for  which  y  falls  in  the 
prediction  interval  I(x);  this  interval  is  actually  a  (l-a)lOOX  confidence 
interval  for  the  unknown  xq.  Again,  the  effect  of  not  weighting  is  intervals 
which  are  too  long  for  xq  where  the  variance  is  small  and  the  opposite  when  the 
variance  is  large.  We  are  not  familiar  with  any  extensive  investigation  of 
calibration  confidence  Intervals  for  heteroscedastic  models,  although  see 
Watters.  Carroll  and  Splegelman  (1987).  Figure  2  represents  an  example  of  this 
phenomenon  ln  the  situation  of  Figure  1  for  the  region  of  small  variance. 

The  key  point  of  this  discussion  is  that  when  heterogeneity  of  variance  is 
present,  how  well  one  models  and  estimates  the  variances  will  have  substantial 
impact  on  prediction  and  calibration  based  on  the  estimated  mean  response, 
since  the  form  of  the  Intervals  depends  on  the  form  of  the  variance  function. 
Some  theoretical  work  has  been  done  verifying  the  implications  of  this 
discussion;  for  an  investigation  of  how  the  statistical  properties  of 
estimators  for  calibration  quantities  depend  on  those  of  the  estimator  0.  see 
Carroll,  David lan  and  Smith  (1986)  and  Carroll  (1987). 


.V* 
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3.  ESTIMATION  OF  9 


We  now  discuss  the  form  and  motivation  for  several  estimators  of  0  in 


(1.1).  In  what  follows,  let  /3  be  a  preliminary  estimator  for  p.  This  could 


be  unweighted  least  squares  or  the  current  estimate  in  an  iterative  reweighted 


least  squares  calculation.  Let  =  {Yj  -  f  (x^  ,P)}/{og(z.  ,p,  0)}  denote  the 


errors  so  that  E  =  0  and  E  e*  =  1,  and  denote  the  residuals  by  r^  =  Y^  - 


ffXj.p^).  We  consider  some  methods  requiring  rrij  £  2  replicates  at  each  of  M 


design  points;  for  simplicity,  we  consider  only  the  case  of  equal  replication 


m  =  m  and  write  in  obvious  fashion  (Y,  .},  j  =  1 . m.  to  denote  the  m 


observations  at  where  appropriate,  so  that  N  =  Mm  is  the  total  number  of 


observations.  In  this  case,  let  YJ#  and  s^  denote  the  sample  mean  and  standard 


deviation  at  x. .  For  consistency  of  exposition,  however,  we  denote  the  sum 


over  all  observations  as 


instead  of 


When  we  speak  of  replacing  absolute  residuals  {  | r ^  | }  by  sample  deviations  {s^} 
in  the  case  of  replication,  |r^|  or  s^  appears  m  times  in  the  sum. 


3. 1  Regression  Methods 


Pseudo- 1 1 ke 1 i hood .  Given  P  ,  the  pseudo- 1  ike 1 i hood  estimator  maximizes 


the  normal  log- likelihood  l(P  ,d, a) ,  where 


(3.1) 


HP.O.cr)  =  -N  log  a  -  l^_1log{g(zi,p,0)} 

-  (2o2)-1^=1  {Y1-f(xi.p)}2/g2(zi,p,0). 


see  Carroll  and  Ruppert  (1982a).  Here  the  term  "pseudo- likelihood"  is  used  as 


I 


/-A 


in  Gong  and  Samaniego  (1961).  Generalizations  of  pseudo-likelihood  for  robust 


estimation  have  been  studied  by  Carroll  and  Ruppert  (1982a)  and  Giltinan, 


Carroll  and  Ruppert  (1986). 


-east  squares  on  squared  residuals.  Besides  pseudo- likelihood,  other 


methods  using  squared  residuals  have  been  proposed.  The  motivation  for  these 


methods  is  that  the  squared  residuals  have  approximate  expectation 


a g  (z J./3.0),  see  Jobson  and  Fuller  (1980)  and  Amemiya  (1977).  This  suggests  a 


nonlinear  regression  problem  in  which  the  "responses"  are  (r  }  and  the 


'regression  fu-ction"  is  o' g  (z^.13^.0).  The  estimator  8g^  minimizes  in  0  and  a 


_N  ,2  2  2,%  axx2 

^1=1  {ri  "  a  *  (Zi»P*,e)}  ‘ 


For  normal  data  the  squared  residuals  have  approximate  variance  o  g  (z^/3.0); 


in  the  spirit  of  generalized  least  squares,  this  suggests  the  weighted 


estimator  which  minimizes  in  6  and  o 


(3.2) 


2?=i  (r2  -  a2g2(z1iM.e)}2/g4(ziiM.0J, 


where  0^  is  a  preliminary  estimator  for  0,  0g^  for  example.  Full  iteration. 


when  it  converges,  would  be  equivalent  to  pseudo-likelihood. 


Accounting  for  the  effect  of  1i»vBrnpri».  One  objection  to  methods  such  as 


pseudo- likelihood  and  least  squares  based  on  squared  residuals  is  that  no 


compensation  is  made  for  the  loss  of  degrees  of  freedom  associated  with 


preliminary  estimation  of  /3.  For  example,  the  effect  of  applying 


pseudo- likelihood  directly  seems  to  be  a  bias  depending  on  p/N.  For  settings 


such  as  fractional  factorials  where  p  is  large  relative  to  N  this  bias  could  be 


substantial . 


Bayesian  ideas  have  been  used  to  account  for  loss  of  degrees  of  freedom; 


ms, 


w 

’  '  v'  '  V'  v*  %'  ■-  •.  A  A  A  AA 
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see  Harville  (1977)  and  Patterson  and  Thompson  (1974).  When  g  does  not  depend 
on  P,  the  restricted  maximum  likelihood  approach  of  the  latter  authors  suggests 
in  our  setting  one  estimate  0  from  the  mode  of  the  marginal  posterior  density 
for  0  assuming  normal  data  and  a  prior  for  the  parameters  proportional  to  a 
When  g  depends  on  p.  one  may  extend  the  Bayesian  arguments  and  use  a  linear 
approximation  as  in  Box  and  Hill  (1974)  and  Beal  and  Sheiner  (1986)  to  define  a 
restricted  maximum  likelihood  estimator. 

Let  Q  be  the  N  x  p  matrix  with  ith  row  fp(x^ .  P)Vg(z^ ,/3,0) .  where  fp(x. ,P) 
=  d/dp  (f(xi.p)}.  and  let  H  =  QfC^Q)  be  the  "hat"  matrix  with  diagonal 

element  h^s  hii(P,0);  the  values  {h^}  are  the  leverage  values.  It  turns  out 
that  the  restricted  maximum  likelihood  estimator  is  equivalent  to  an  estimator 
obtained  by  modifying  pseudo- likelihood  to  account  for  the  effect  of  leverage. 
This  characterization,  while  not  unexpected,  is  new;  we  derive  this  estimator 
and  its  equivalence  to  a  modification  of  pseudo- likelihood  in  Appendix  B. 

The  least  squares  approach  using  squared  residuals  can  also  be  modified  to 
show  the  effect  of  leverage.  Jobson  and  Fuller  (1980)  essentially  note  that 
for  nearly  normally  distributed  data  we  have  the  approximations 

ErJ'  o2(l-hii)g2(zi,p,0). 
var  r2  ~  2  a4(l-hli)2g4(zi .0.0) . 


To  exploit  these  approximations  modify  (3.2)  to  minimize  in  0  and  a 


(3.3)  2^=1  (r2  -  a2(l-hiI)g2(ziiM.0)}2/{(l-hii)2g4(z1.p,f.0>t)}. 


where  h^  =  h^JP^.O^)  and  is  a  preliminary  estimator  for  0.  An 
asymptotically  equivalent  variation  of  this  estimator  in  which  one  sets  the 
derivatives  of  (3.3)  with  respect  to  0  and  a  equal  to  0  and  then  replaces  0W  by 


12 


j 

0  can  be  seen  to  be  equivalent  to  pseudo- like  11  hood  In  which  one  replaces 
standardized  residuals  by  studentized  residuals.  While  this  estimator  also 
takes  into  account  the  effect  of  leverage,  it  is  different  from  restricted 
maximum  likelihood. 

Least  squares  on  absolute  residuals.  Squared  residuals  are  skewed  and 
long-tailed,  which  has  lead  many  authors  to  propose  using  absolute  residuals  to 

i  estimate  0;  see  Glejser  (1969)  and  Theil  (1971).  Assume  that 

i 

E|Yi  -  f(xi.p)|  =  qg(z i.0.0). 

j  which  is  satisfied  if  the  errors  (e^)  are  independent  and  identically 

1  distributed.  Mimicking  the  least  squares  approach  based  on  squared  residuals, 

one  obtains  the  estimator  0^_  by  minimizing  in  q  and  0 

2j_l  (IrJ  -  ng(zi  P*.0)}2. 

In  analogy  to  (3.2),  the  weighted  version  is  obtained  by  mimimizing 

l 

|  sj-i  (kj  ~  Vg(zi.PM.0)}2/g2(zi'PM-8tf)- 

l  ** 

j  where  0^  is  a  preliminary  estimator  for  0,  probably  0^.  As  for  least  squares 

estimation  based  on  squared  residuals,  one  presumably  could  modify  this 
approach  to  account  for  the  effect  of  leverage. 

>  Logarithm  method.  The  suggestion  of  Harvey  (1976)  is  to  exploit  the  fact 

that  the  logarithm  of  the  absolute  residuals  has  approximate  expectation  log 
(og(z j.P.0)}.  Estimate  0  by  ordinary  least  squares  regression  of  log  | r ^  |  on 

A 

log  (ogfZj ,  since  if  the  errors  are  Independent  and  identically 
distributed,  the  regression  should  be  approximately  homoscedastic.  If  one  of 


the  residuals  is  near  zero  the  regression  could  be  adversely  affected  by  a 
large  "outlier,"  hence  in  practice  one  might  wish  to  delete  a  few  of  the 
smallest  absolute  residuals,  perhaps  trimming  the  smallest  few  percent. 

3.2  Other  methods 

Besides  squares  and  logarithms  of  absolute  residuals,  other 
transformations  could  be  used.  For  example,  the  square  root  and  2/3  root  would 
typically  be  more  normally  distributed  than  the  absolute  residuals  themselves. 
Such  transformations  appear  to  be  useful,  although  they  have  not  been  used  much 
to  our  knowledge.  Our  asymptotic  theory  applies  to  such  transformations. 

In  a  parametric  model  such  as  (1.1),  joint  maximum  likelihood  estimation 
is  possible,  where  we  use  the  term  maximum  likelihood  to  mean  normal  theory 
maximum  likelihood.  When  the  variance  function  does  not  depend  on  p.  it  can  be 
easily  shown  that  maximum  likelihood  is  asymptotically  equivalent  to  weighted 
least  squares  methods  based  on  squared  residuals.  In  the  situation  in  which 
the  variance  function  depends  on  P  this  is  not  the  case.  In  this  setting,  it 
has  been  observed  by  Carroll  and  Ruppert  (1982b)  and  McCullagh  (1983)  that 
while  maximum  likelihood  estimators  enjoy  asymptotic  optimality  when  the  model 
and  distributional  assumptions  are  correct,  the  maximum  likelihood  estimator  of 
P  can  suffer  problems  under  departures  from  these  assumptions.  This  suggests 
that  Joint  maximum  likelihood  estimation  should  not  be  applied  blindly  in 
practice.  The  theory  of  the  next  section  shows  the  asymptotic  equivalence  of 
maximum  likelihood  with  other  methods  in  a  simplifying  special  case.  Based  on 
this  theory,  we  tend  to  prefer  weighted  regression  methods  even  when  the  data 
are  approximately  normal  for  reasons  of  relative  computational  simplicity. 

While  we  have  chosen  to  describe  the  methods  of  Section  3.1  as  "regression 
methods,"  asymptotically  equivalent  versions  of  such  methods  may  be  derived  by 


,>  t.t  l.l  1,1 
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considering  maximum  likelihood  assuming  some  underlying  distribution.  For 
example,  the  form  of  the  weighted  squared  residuals  method  is  that  of  normal 

A 

theory  maximum  likelihood  with  /3  known  and  0M  replaced  by  0 
(pseudo- likelihood) ;  the  form  of  the  weighted  absolute  residual  method  is  that 
of  maximum  likelihood  assuming  /3  known  and  0^  replaced  by  0  under  the  double 
exponential  distribution.  Thus,  what  we  term  a  regression  method  may  be  viewed 
as  an  approximation  to  maximum  likelihood  assuming  a  particular  distribution. 
We  feel  that  the  regression  interpretation  is  a  much  more  appealing  and  natural 
motivation,  since  no  particular  distribution  need  be  considered  to  obtain  the 
form  of  the  estimators,  only  the  mean-variance  relationship. 

Another  joint  estimation  method  is  the  extended  quasi- likelihood  of  Nelder 
and  Pregibon  (1987)  also  described  in  McCullagh  and  Nelder  (1983).  This 
estimator  is  based  on  assuming  a  class  of  distributions  "nearly”  containing 
skewed  distributions  such  as  the  Poisson  and  gamma.  While  it  may  be  viewed  as 


iteration  between  estimation  of  0  and  a  and  generalized  least  squares  for  p. 


technically  this  scheme  does  not  fit  in  the  general  framework  of  the  next 


section;  an  asymptotic  theory  has  been  developed  elsewhere,  see  Davidian  and 


Carroll  (1987).  A  related  formulation  is  given  by  Efron  (1986). 


Methods  requiring  replicates  at  each  design  point  have  been  proposed  in 


the  assay  literature.  These  methods  do  not  depend  on  the  postulated  form  of 


the  regression  function;  one  reason  that  this  may  be  advantageous  is  that  in 


many  assays  along  with  observed  pairs  (Y^.Xj)  there  will  alst  be  pairs  in 


which  only  is  observed.  A  popular  and  widely  used  method  is  that  of 


Rodbard  and  Frazier  (1975).  If  we  assume 


(3.4) 


g(z1,p,0)  =  g(Mi,zi,0), 


as  in,  for  example,  (1.2)  or  (1.3),  the  method  is  identical  to  the  logarithm 


method  previously  discussed  except  that  one  replaces  |r^ |  by  the  sample 

A 

standard  deviation  s.  and  f(x.  ,/3  )  in  the  "regression"  function  by  the  sample 

mean  Y.  .  As  a  motivation  for  this  and  the  method  of  Harvey,  consider  that 

under  (1.2)  0  is  simply  the  slope  parameter  for  a  simple  linear  regression. 

As  an  alternative,  under  the  assumption  of  independence  and  (3.4),  the 

modified  maximum  likelihood  method  of  Raab  (1981)  estimates  0  by  joint 

2 

maximization  in  the  (M+r+1)  parameters  a  ,0.Pj . of  the  "modified"  normal 

likelihood 

(3.5)  n*f=1{21ra2g2(M1.zi.0))(m_1)/2exp[-^=1(Y1J-pi)2/{2a2gV1.zi.0))] 

The  modification  serves  to  make  the  estimator  of  a  unbiased.  The  idea  here  is 
to  improve  upon  the  regression  method  of  Rod  bard  by  appealing  to  a  maximum 
likelihood  approach  which,  despite  a  parameter  space  increasing  as  the  number 

of  design  points,  is  postulated  to  have  reasonable  properties.  A  related 

method  is  that  in  which  0  and  a  are  estimated  by  maximizing  (3.5)  with 
replaced  by  Y^#,  the  motivation  being  computational  ease  and  evidence  that  this 
estimator  may  not  be  too  different  from  that  of  Raab  in  practice,  see  Sadler 
and  Smith  (1985). 

Table  1  contains  a  summary  of  some  of  the  common  methods  for  variance 
function  estimation  and  their  formulations. 


4.  AN  ASYMPTOTIC  THEORY  OF  VARIANCE  FUNCTION  ESTIMATION 


In  this  section  we  construct  an  asymptotic  theory  for  a  general  class  of 
regress ion- type  estimators  for  0.  Since  our  major  interest  lies  in  obtaining 
general  insights,  we  do  not  state  technical  assumptions  or  details.  In  what 


follows,  in  the  case  of  replication  N  -»  «  in  such  a  way  that  m  remains  fixed. 
The  reader  uninterested  in  this  development  may  wish  to  skip  to  Section  5, 
where  conclusions  and  implications  of  the  theory  are  presented. 


4. 1  Methods  based  on  transformations  of  absolute  residuals 

Write  d^(P)  =  | -  f(Xj,P)|.  Let  T  be  a  smooth  function  and  define  Mi  by 

M^V.O.P)  =  E  [  T{d1(/3)}  ]. 

where  17  is  a  scale  parameter  which  is  usually  a  function  of  a  only.  We 
consider  estimation  of  the  more  general  parameter  17  Instead  of  a  itself  for 
ease  of  exposition,  and  since  a  is  estimated  jointly  with  0  in  regression 

A  A  A 

models,  our  theory  focuses  on  expansions  for  17  and  0  jointly.  If  tj  ,  0^  and  p* 

A  A 

are  any  preliminary  estimators  for  17,  0,  and  p.  define  tj  and  0  to  be  the 

solutions  of 


—  1/9  N  A  ~  AAA 

(4.1)  N  2jsl  HjCn .e,pM)  (T(d i(PM)}  -  M^tj.O.P*)}  /  vi(T7.0H.pM)  =  0. 

where  V^(tj .B.P)  is  a  smooth  function  and  is  a  smooth  function  which  for  the 
estimators  of  Section  3  is  the  partial  derivative  of  with  respect  to  (17, 0). 
In  what  follows,  we  suppress  the  arguments  of  the  functions  M^,  V^,  etc.  when 
they  are  evaluated  at  the  true  values  17,  0,  and  p.  Specific  examples  are 
considered  in  the  next  section. 

The  class  of  estimators  solving  (4.1)  Includes  directly  or  includes  an 
asymptotically  equivalent  version  of  the  estimators  of  Section  3.1.  For 
methods  which  account  for  the  effect  of  leverage,  M. ,  V.  and  will  depend  on 


A  A  A  1/2 

Theorem  4.1.  Let  t)h,  0^  and  PM  be  N  consistent  for  estimating  t), 
Let  T  be  the  derivative  of  T  and  define 


6  and  p. 


B 


3.N 


B2,N 


Cj  =  Ht  [T{d1(P)}  -  Ma]  /  Vi: 

B1.N  -  ""‘^1 

=  -N_1I^=1  (Hi/Vi)  d/dp  {Mi(r 1.0.P)}; 


i=l  (Hi/V1)fp(xi.p)  E  [T{d1(p)}sign(ei)]. 


Then,  under  regularity  conditions  as  N  -*  ®, 


(4.2)  B1N  N 


1/2 


n  -  v 

0-0 


*  +  (B2,N+B3.N»  «1/2(P»-P)  +  V».  □ 


We  may  immediately  make  some  general  observations  about  the  estimator  0 
solving  (4.1).  Note  that  if  the  variance  function  does  not  depend  on  p.  then 

Mj  does  not  depend  on  P  and  hence  Bg  N  ~  For  t^ie  estimators  of  Section  2.1. 

• 

T  is  an  odd  function.  Thus,  if  the  errors  (e^)  are  symmetrically  distributed, 
E[  Hi(di(P)}sign(e1)  ]  =  0  and  hence  B^  N  =  0. 

Corollary  4. lfal.  Suppose  that  the  variance  function  does  not  depend  on  p  and 
the  errors  are  symmetrically  distributed.  Then  the  asymptotic  distributions  of 
the  regression  estimators  of  Section  3.1  do  not  depend  on  the  method  used  to 

A 

obtain  P  .  If  both  of  these  conditions  do  not  hold  simultaneously,  then  the 
asymptotic  distributions  will  depend  in  general  on  the  method  of  estimating  p. 

□ 


The  implication  is  that  in  the  situation  for  which  the  variance  function 
does  not  depend  on  p  and  the  data  are  approximately  symmetrically  distributed, 
for  large  sample  sizes  the  preliminary  estimator  for  p  will  play  little  role  in 


determining  the  properties  of  0.  Note  also  from  (4.2)  that  for  weighted 
methods,  the  effect  of  the  preliminary  estimator  of  0  is  asymptotically 
negligible  regardless  of  the  underlying  distributions. 

A 

The  preliminary  estimator  P*  might  be  the  unweighted  least  squares 
estimator,  a  generalized  least  squares  estimator  or  some  robust  estimator. 
See,  for  example,  Huber  (1981)  and  Giltinan,  Carroll  and  Ruppert  (1986)  for 
examples  of  robust  estimators  for  p.  For  some  vectors  (v^  ^},  these  estimators 
admit  an  asymptotic  expansion  of  the  form 

(4.3)  N1/2(Ph  -  p)  =  N_1/2^=1  *(vN>1.fel)  +  op(l). 

Here  is  odd  in  the  argument  e.  In  case  the  variance  function  depends  on  P. 
^2  N  *  0  ln  8eneral :  however,  if  the  errors  are  symmetrically  distributed  and 

A 

PH  has  expansion  of  form  (4.3),  then  the  two  terms  on  the  right-hand  side  of 
(4.2)  are  asymptotically  independent.  The  following  is  then  immediate. 

Corollary  4.1fbl.  Suppose  that  the  errors  are  symmetrically  distributed  and 
that  has  an  asymptotic  expansion  of  the  form  (4.3).  Then  for  the  estimators 

A 

of  Section  3.1,  the  asymptotic  covariance  matrix  of  0  is  a  monotone 

A 

nondecreasing  function  of  the  asymptotic  covariance  matrix  of  P*.  □ 

By  the  Gauss-Markov  theorem  and  the  results  of  Jobson  and  Fuller  (1980) 
and  Carroll  and  Ruppert  (1982a),  the  implication  of  Corollary  4.1(b)  is  that 
using  unweighted  least  squares  estimates  of  P  will  result  in  inefficient 
estimates  of  0.  This  phenomenon  is  exhibited  in  small  samples  in  a  Monte  Carlo 
study  of  Carroll,  Davidian  and  Smith  (1986).  If  one  starts  from  the  unweighted 
least  squares  estimate,  one  ought  to  iterate  the  process  of  estimating  0  —  use 

A  A  A 

the  current  value  p to  estimate  0  from  (4.1),  use  these  PM  and  0  to  obtain  an 


/  f 


♦  a*  it*  t.V  i^l*i .  %*m'  ‘at.  >.|  i.t  i.|  (j'4>l'ki'l 


updated  0  by  generalized  least  squares  and  repeat  the  process  <6  -  1  more 


times.  It  is  clear  that  the  asymptotic  distribution  of  0  will  be  the  same  for 


•6  £  2  with  larger  asymptotic  covariance  for  “€  =  1,  so  in  principle  one  ought  to 


iterate  this  process  at  least  twice.  See  Carroll.  Wu  and  Ruppert  (1987)  for 


more  on  iterating  generalized  least  squares. 


4.2  Methods  based  on  sample  standard  deviations 


Assume  replication,  and  as  before  let  {s^}  he  the  sample  standard 
deviations  at  each  x^ ,  which  themselves  have  been  proposed  as  estimators  of  the 


variance  in  generalized  least  squares  estimation  of  p.  This  can  be 


disasterous,  see  Jacquez,  Mather  and  Crawford  (1968).  When  replication  exists. 


however,  practitioners  feel  comfortable  with  the  notion  that  the  {s^}  may  be 
used  as  a  basis  for  estimating  variances;  thus,  one  might  reasonably  seek  to 

A 

estimate  0  by  replacing  <^(0*)  by  si  in  (4.1). 

The  following  result  is  almost  immediate  from  the  proof  of  Theorem  4.1  in 


Appendix  A. 


Theorem  4.2.  If  dj(0*)  is  replaced  by  s^  in  (4.1),  then  under  the  conditions 
of  Theorem  4.1  the  resulting  estimator  for  0  satisfies  (4.2)  with  ^  =  0  and 


the  redefinitions 


(4.4a) 


C.  =  (H./V.)(T(s.)  -  Ml; 


(4.4b) 


Mt  =  E  (T(Si)>  =  M^rj.0.0). 


If  the  errors  are  symmetrically  distributed,  then  from  (4.2)  and  Theorem 


4.2,  whether  one  is  better  off  using  absolute  residuals  or  sample  standard 


deviations  in  the  methods  of  Section  3.1  depends  only  on  the  differences 


■uji 

1 


B 
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between  the  expected  values  and  variances  of  T{d^(/3)}  and  Tfs^.  In  Section  5 
we  exhibit  such  comparisons  explicitly  and  show  that  absolute  residuals  can  be 
preferred  to  sample  standard  deviations  in  situations  of  practical  importance. 

4.3  Methods  not  depending  on  the  regression  function 

We  assume  throughout  this  discussion  that  the  variance  function  has  form 
(3.4)  and  replication  is  available.  From  Section  3.1  we  see  that  the 

a 

"regression  function"  part  of  the  estimating  equations  depends  on  ffx^.P^),  so 

a 

that  in  the  general  equation  (4.1)  Mj.  and  1^  all  depend  on  ffx^p^).  In 
some  settings,  one  may  not  postulate  a  form  for  the  for  estimating  0;  the 

A 

method  of  Rodbard  and  Frazier  (1975),  for  example,  uses  s^  in  place  of  d ^ ( ) 
as  in  Section  4.2  and  replaces  f^j.JJ^)  by  the  sample  mean  Yj#.  We  now 
consider  the  effect  of  replacing  predicted  values  by  sample  means  for  the 
general  class  (4.1). 

The  presence  of  the  sample  means  in  the  variance  function  in  (4.1) 

requires  more  complicated  and  restrictive  assumptions  than  the  usual  large 

sample  asymptotics  applied  heretofore.  The  method  of  Rodbard  and  Frazier  and 

the  general  method  (4.1)  with  sample  means  are  functional  nonlinear  errors  in 

variables  problems  as  studied  by  Wolter  and  Fuller  (1982)  and  Stefanski  and 

Carroll  (1985).  Standard  asymptotics  for  these  problems  correspond  to  letting 

-1/2 

a  go  to  zero  at  rate  N  .  In  Section  4.4  we  discuss  the  practical 
Implications  of  a  being  small;  for  now,  we  state  the  following  result. 

a 

Theorem  4.3.  .  Suppose  that  we  replace  f by  Yj#  in  and  in 

(4.1)  and  adopt  the  assumptions  of  Theorems  4.1  and  4.2.  Further,  suppose  that 
as  N  -*  »,  a  -»  0  simultaneously  and 

(i)  N1/2ct  -*  X,  0  i  \  <  »; 


.•.vv.v.v.v.y.^ v V/./ '■  a--  . . . 


J»  “  M  »  ‘ 

%  V  *.  S  %  V  V  S.*  \  s’  .  -  .  . 

- J*  •  •  *  Jt  ■  »  »  *  •  *  fc  *  •  '  ,  v  «.  - 


( i  i  )  n1/2s?=i  Cj  has  a  nontrivial  asymptotic  normal  limit  distribution; 

(iii)  The  {fej}  are  syirmetric  and  i.i.d  ; 

(iv)  ~  I  /  o)  has  uniformly  bounded  k  moments,  some  k  >  2. 

Then  the  results  of  Theorems  4.1  and  4.2  hold  with  Bg  N  =  N  S  D 

This  result  shows  that  under  certain  restrictive  assumptions,  one  may 
replace  predicted  values  by  sample  means  under  replication;  however,  it  is 
important  to  realize  that  the  assumption  of  small  a  is  not  generally  valid  and 
hence  the  use  of  sample  means  may  be  disadvantageous  in  situations  where  these 
asymptotics  do  not  apply.  Further,  relaxation  of  assumption  (iii)  will  result 
in  an  asymptotic  bias  in  the  asymptotic  distribution  of  the  estimator  not 
present  for  estimators  based  on  residuals  regardless  of  the  assumption  of 
symmetry;  see  Appendix  A. 

The  estimator  of  Raab  (1981)  discussed  in  Section  3.2  is  also  a  functional 
nonlinear  errors  in  variables  estimator,  complicated  by  a  parameter  space  with 
size  of  order  N.  Sadler  and  Smith  (1985)  have  observed  that  the  Raab  estimator 
is  often  indistinguishable  from  their  estimator  with  nl  replaced  by  Y^#  in 
(3.5);  such  an  estimator  is  contained  in  the  general  class  (4.1).  Davidian 
(1986)  has  shown  that  under  the  asymptotics  of  Theorem  4.3  and  additional 
regularity  conditions  that  the  two  estimators  are  asymptotically  equivalent  in 
an  important  special  case.  We  may  thus  consider  the  result  of  Theorem  4.3 
relevant  to  this  estimator. 

4.4  Small  a  asymptotics 


In  Section  4.3  technical  considerations  forced  us  to  pursue  an  asymptotic 
theory  in  which  a  is  small.  It  turns  out  that  in  some  situations  of  practical 
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importance  these  asymptotics  are  relevant.  In  particular,  in  assay  data  we 
have  observed  values  for  a  which  are  quite  small  relative  to  the  means.  Such 
asymptotics  are  used  in  the  study  of  data  transformations  in  regression.  It  is 
thus  worthwhile  to  consider  the  effect  of  small  a  on  the  results  of  Sections 
4.1  and  4.2  and  to  comment  on  some  other  implications  of  letting  a  -»  0. 

In  the  situation  of  Theorem  4.1,  if  the  errors  are  symmetrically 
distributed,  then  for  the  estimators  of  Section  3.1,  if  a  -»  0  as  N  -»  ».  then 
there  is  no  effect  for  estimating  the  regression  parameter  p.  In  the  situation 
of  Theorem  4.2,  the  errors  need  not  even  be  symmetrically  distributed.  The 
major  insight  provided  by  these  results  is  that  in  certain  practical  situations 
in  which  a  is  small,  the  choice  of  may  not  be  too  important  even  if  the 
variance  function  depends  on  p. 

Small  a  asymptotics  may  be  used  also  to  provide  insight  into  the  behavior 
of  other  estimators  for  0  which  do  not  fit  into  the  general  framework  of  (4.1). 
It  can  be  shown  that  the  extended  quasi- likelihood  estimator  need  not 
necessarily  be  consistent  for  fixed  a,  but  if  one  adopts  the  asymptotics  of  the 
previous  section,  this  estimator  is  asymptotically  equivalent  to  regression 
estimators  based  on  squared  residuals  as  long  as  the  errors  are  symmetrically 
distributed.  Otherwise,  an  asymptotic  bias  results  which  may  have  implications 
for  Inference  for  0.  For  discussion  see  Davldian  and  Carroll  (1987). 

The  small  a  assumption  also  provides  an  illustration  of  the  relationship 
between  variance  function  estimation  and  data  transformations.  Let  l(y.X)  = 
(y^  -  1)/X,  and  consider  the  model 

(4.5)  E{  *(YitX)  }  =  t(  f (Xj.pj.X  };  var(  ^(Y^X  )  =  a; 

such  "transform  both  sides"  models  are  proposed  and  motivated  by  Carroll  and 
Ruppert  (1984).  For  a  Z  0.  E(Yj)  z  f(xr/3)  and  varfY^  2  a  f  ,P)(  1_X) .  so 
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that  In  (1.2)  we  have  0  ~  1  -  X.  Thus,  when  the  small  a  assumption  is 
relevant.  (4.5)  and  (1.1),  (1.2)  represent  approximately  the  same  model. 

5.  APPLICATIONS  AND  FURTHER  RESULTS 


In  Section  4  we  constructed  an  asymptotic  theory  for  and  stated  some 
general  characteristics  of  regress ion- type  estimators  of  0.  In  this  section  we 
use  the  theory  to  exhibit  the  specific  forms  for  the  various  estimators  of 
Section  3  and  compare  and  contrast  their  properties.  In  our  investigation  we 
rely  on  the  simplifying  assumptions  implied  by  the  theory  of  Section  4r~^in 
particular  the  small  a  asymptotic  approach  in  which  a  -*  0  as  N  -»  ®. 
Throughout,  define  v(l.p.O)  =  log  g(z ^,0,9),  let  Vg(1.0,9)  be  the  column  vector 
of  partial  derivatives  of  u  with  respect  to  0,  let  f(p,0)  be  the  covariance 
matrix  of  v^(i ,0,9) ,  and  let  r(i,0,O)  -  {  1,  ^(i.^.O)}1".  For  simplicity, 
assume  that  the  errors  {e^}  are  Independent  and  identically  distributed  with 
kurtosis  x;  x  =  0  for  normality. 

v 

5. 1  Maximum  likelihood,  pseudo- likelihood,  restricted  maximum  likelihood  and 
weighted  squared  residuals. 


Writing  17  =  log  a,  we  have  T(x)  =  x^,  =  exp(2n)  g^z^.p,©),  =  M^,  H* 

=  aMj/afrj.oVand  E  [  Tfd^p)}  signftj)  ]  =  2  E  [  Y{  -  f(xt>p)]  =  0  so  that 
83  N  S  0  regardless  of  the  underlying  distributions.  If  h  -*  0  such  that  N^^h 
-*  0  for  methods  accounting  for  the  effect  of  leverage,  then  all  of  these 
methods  admit  an  expansion  of  the  form  (4.2)  with  ^  =  0.  The  expansion  will 

A 

be  different  depending  on  whether  0M  is  a  generalized  least  squares  estimator 
for  0  or  full  maximum  likelihood,  since  the  maximum  likelhood  estimator  has  an 


expansion  quadratic  in  the  errors  while  that  of  the  generalized  least  squares 
estimator  is  linear  in  the  see  Carroll  and  Ruppert  (1982b).  The 

implication  is  that  regression  methods  based  on  iterated  weighted  squared 
residuals  and  full  maximum  likelihood  are  different  in  general  asymptotically. 
Regardless  of  the  underlying  distributions,  for  fixed  a,  Davidian  (1986)  has 
shown  that  the  asymptotic  covariance  matrix  of  the  former  methods  increases 
without  bound  as  a  function  of  a  while  that  of  maximum  likelihood  remains 
bounded  for  all  a.  Further,  a  simple  comparison  of  the  two  covariances  reveals 
that  under  reasonable  conditions  maximum  likelihood  has  smaller  asymptotic 
covariance  as  long  as  x  £  2.  While  these  facts  may  suggest  a  preference  for 
full  maximum  likelihood  even  away  from  normality,  the  computational  and  model 
robustness  considerations  mentioned  earlier  may  make  this  preference  tenuous. 

Generalized  least  squares  and  maximum  likelihood  estimators  for  P  both  satisfy 

~  -1/2  ~ 

=  Oj,(ctN  )•  80  that  if  a  -*  0  or  g  does  not  depend  on  P.  then  0  is 

asymptotically  normally  distributed  with  mean  0  and  covariance  matrix 

(5.1)  (2  +  k)  (4N  f(P.e)}'1. 

As  mentioned  in  Section  3,  under  the  small  a  asymptotics  of  Theorem  3.3, 
the  extended  quasi-likelihood  estimator  of  0  is  asymptotically  equivalent  to 
the  estimators  here  with  asymptotic  covariance  matrix  (5.1).  Thus,  if  g  does 
not  depend  on  P  or  a  -»  0,  pseudo- likelihood,  weighted  squared  residuals, 
restricted  maximum  likelihood,  maximum  likelihood  and,  if  a  -»  0,  extended 
quasi-likelihood,  are  all  asymptotically  equivalent.  In  addition,  all  of  these 
estimators  have  Influence  functions  which  are  linear  in  the  squared  errors, 
indicating  substantial  nonrobustness. 

We  may  also  observe  that  these  methods  are  preferable  to  unweighted 
regression  on  squared  residuals.  Write  (5.1)  as 


(5.2) 


(1/2  +  k/A)  (WV  W)  . 


where  V  is  the  N  x  N  diagonal  matrix  with  elements  and  W  is  the  N  x  p  matrix 
with  i**1  row  H*.  For  the  unweighted  estimator  based  on  squared  residuals, 
calculations  similar  to  those  above  show  that  the  asymptotic  covariance  matrix 
when  either  g  does  not  depend  on  0  or  a  -»  0  is  given  by 

(5.3)  (1/2  +  x/4)  (WtW)“1(WtVW)(WtW)'1. 

The  comparison  between  (5.2)  and  (5.3)  is  simply  that  of  the  Gauss-Harkov 
theorem,  so  that  (5.2)  is  no  larger  than  (5.3). 

5.2  Logarithms  of  absolute  residuals  and  the  effect  of  inliers 

We  do  not  consider  deletion  of  the  few  smallest  absolute  residuals.  Here 
T(x)  =  log  x  so  that  T(x)  =  x  Letting  tj  =  log  a  and  assuming  independent 
and  identically  distributed  errors  we  have  =  tj  +  u(i,0.0)  +  E  log  |e|,  = 

1.  and  Hj  =  r(i.p.O).  Under  the  assumption  of  symmetry  of  the  errors,  with  g 
not  depending  on  0  or  a  -»  0,  tedious  algebra  shows  that  0  is  asymptotically 
normally  distributed  with  mean  0  and  covariance  matrix 

(5.4)  var  {log  (  |e |2)>  {4N  f(0.0)}_1. 

The  influence  function  for  this  estimator  is  linear  in  the  logarithm  of  the 
absolute  errors.  This  indicates  nonrobustness  more  for  inliers  than  for 


outliers,  which  at  the  very  least  is  an  unusual  phenomenon.  If  the  errors  are 
not  symmmetric  then  there  will  be  an  additional  effect  due  to  estimating  0  not 


present  for  the  methods  of  Section  5.1,  even  if  g  does  not  depend  on  p 


5.3  Weighted  Absolute  Residuals 


Assume  that  the  errors  are  independent  and  identically  distributed  and  let 
exp (t?)  =  crE|fc|.  Consider  the  weighted  estimator.  We  have  T(x)  =  x.  T(x)  =  1. 
Mj  =  exp(rj)  g(Zj,0,0)  and  Thus,  if  the  errors  are  symmetrically 

a 

distributed  and  either  g  does  not  depend  on  P  or  a  -*  0,  0  is  asymptotically 
normally  distributed  with  mean  0  and  covariance  matrix 


(5.5) 


{6/(1  -  6)}  (N  £(P.0)}  . 


where  6  =  var  |e|.  The  Influence  function  for  this  estimator  is  linear  in  the 
absolute  errors.  By  an  argument  similar  to  that  at  the  end  of  Section  5.1,  we 
may  conclude  that  when  the  effect  of  PM  is  negligible  one  should  use  a  weighted 
estimator  and  iterate  the  method. 


5.4  General  transformations 


One  may  also  consider  other  power  transformations  of  absolute  residuals. 
If  X  ^  0  is  the  power  of  absolute  residuals  on  which  the  regression  is  based, 
then  define  tj  by  exp(Xt))  *  a ^  E(|t|^)  and  T(x)  =  x\  Then  Mj  =  exp(XTj) 

g^(z J./3.0).  Vj  =  Straightforward  calculations  show  that  if  the  errors  are 

>\ 

symmetric  and  either  g  does  not  depend  on  P  or  a  -»  0,  then  0  is  asymptotically 
normally  distributed  with  mean  0  and  asymptotic  covariance  matrix 


(5.6) 


[var  (  |e|  )  /  {  E  (kl  )  }]  {  X2  N  f(0.0)  f\ 


with  influence  function  linear  in  |e|\  For  square  root  transformations,  for 
example,  X  =  1/2,  and  from  (5.1)  and  (5.6),  the  asymptotic  relative  efficiency 
of  the  square  root  transformation  relative  to  pseudo- likelihood  under  normal 
errors  is  0.693;  from  (5.5),  the  efficiency  relative  to  weighted  absolute 
residuals  is  0.791. 

At  this  point  it  is  worthwhile  to  mention  that  under  the  simplifying 
assumptions  of  our  discussion,  the  precision  of  general  regression  estimators 
does  not  depend  on  a.  since  a  general  expression  such  as  (5.6)  is  independent 
of  Tj.  Thus,  how  well  we  estimate  0  in  many  practical  cases  will  be 
approximately  independent  of  o.  Furthermore,  when  the  power  of  the  mean  model 
for  variance  (1.2)  holds.  Vg(i,p,6)  =  log  .  so  that  f(/3,0)  is  the  limiting 
variance  of  the  (log  p^}.  From  the  general  expression  (5.6),  the  precision 
with  which  one  can  estimate  0  depends  only  on  the  relative  spread  of  the  mean 
responses,  not  their  actual  sizes,  and  clearly  this  spread  must  be  fairly 
substantial  so  that  the  spread  of  the  logarithms  of  the  means  will  be  so  as 
well.  The  Implications  are  that  for  (1.2),  the  design  will  play  an  important 
role  in  efficiency  of  estimation  of  0,  and  in  some  practical  situations  we  may 
not  be  able  to  estimate  0  well  no  matter  which  estimator  we  employ. 

5.5  Comparison  of  methods  based  on  residuals 


We  assume  that  the  errors  are  symmetric  and  independent  and  identically 
distributed  and  that  either  g  does  not  depend  on  P  or  a  is  small.  By  (5.1), 
(5.4)  and  (5.5),  the  asymptotic  relative  efficiency  of  the  three  methods 
depends  only  on  the  distribution  of  the  errors.  For  normal  errors,  using 
absolute  residuals  results  in  a  12%  loss  in  efficiency  while  for  standard 
double  exponential  errors  there  is  a  25%  gain  in  efficiency  for  using  absolute 
residuals.  For  normal  errors,  the  logarithm  method  represents  a  59%  loss  of 


efficiency  with  respect  to  pseudo- likelihood. 

In  Table  2  we  present  asymptotic  relative  efficiencies  for  various 
contaminated  normal  distributions.  The  asymptotic  relative  efficiency  of  the 
weighted  absolute  residual  method  to  pseudo- likelihood  is  the  same  as  the 
asymptotic  relative  efficiency  of  the  mean  absolute  deviation  with  respect  to 
the  sample  variance  for  a  single  sample,  see  Huber  (1981,  page  3);  the  first 
column  of  the  table  is  thus  identical  to  that  of  Huber.  The  table  shows  that 
while  at  normality  neither  the  absolute  residuals  nor  the  logarithm  methods  are 
efficient,  a  very  slight  fraction  of  "bad”  observations  is  enough  to  offset  the 
superiority  of  squared  residuals  in  a  dramatic  fashion.  For  example,  just  two 
bad  observations  in  1000  negate  the  superiority  of  squared  residuals.  If  1%  or 
5%  of  the  data  are  "bad,”  absolute  residuals  and  the  logarithm  method, 
respectively,  show  substantial  gains  over  squared  residuals.  The  implication 
is  that  while  it  is  commonly  perceived  that  methods  based  on  squared  residuals 
are  to  be  preferred  in  general,  these  methods  can  be  highly  non-robust.  Our 
formulation  includes  this  result  for  maximum  likelihood,  showing  its  Inadequacy 
under  slight  departures  from  the  assumed  distributional  structure.  We  also 
include  asymptotic  relative  efficiencies  for  appropriately  weighted  residual 
methods  based  on  square,  cube  and  2/3  roots  to  pseudo-likelihood  using  (5.6) 
and  observe  that  these  methods  also  exhibit  comparative  robustness  to 
contamination. 

5.6  Methods  based  on  sample  standard  deviations 

Assume  that  m  £  2  replicate  observations  are  available  at  each  design 
point.  In  practice,  m  is  usually  small  ,  see  Raab  (1981).  We  compare  using 
absolute  residuals  to  using  sample  standard  deviations  in  the  estimators  of 
Section  3.1.  We  assume  that  one  is  fairly  confident  in  the  postulated  form  of 
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the  model,  thus  viewing  methods  based  on  sample  standard  deviations  as  not 

taking  full  advantage  of  the  information  available.  For  simplicity,  assume 

that  the  errors  are  independent  and  identically  and  symmetrically  distributed 

and  that  either  g  does  not  depend  on  P  or  a  is  small.  If  the  errors  are  not 

symmetric  and  a  is  not  small  or  the  variance  depends  on  p,  using  sample 

standard  deviations  presumably  will  be  more  efficient  than  in  the  discussion 

below.  This  issue  deserves  further  attention. 

2 

Let  s  be  the  sample  variance  of  m  errors  (e, . e  }.  It  is  easily  shown 

m  l  m 

by  calculations  analagous  to  those  of  section  5.1  that  replacing  absolute 
residuals  by  sample  standard  deviations  has  the  effect  of  changing  the 
asymptotic  covariance  matrices  (5.1),  (5.4)  and  (5.5)  to 

Pseudo- likelihood  :  {(2  +  k)  +  2/(m  -  1)}  (4N  f(0,0)}-1  ; 

(5.7)  Logarithm  method  :  m  var  (  log  (s^)  )  (4N  f(P,0)}  1  : 

Weighted  absolute  residuals  :  (m  6*  /  (1  -  6^)}  (N  f(P,0)}  *, 

where  6  =  var  (s  ).  Table  3  contains  the  asymptotic  relative  efficiencies  of 

using  sample  standard  deviations  to  using  transformations  of  absolute  residuals 

for  various  values  of  m  when  the  errors  are  standard  normal.  The  values  in  the 
2 

table  for  T(x)  =  x  and  x  indicate  that  if  the  data  are  approximately  normally 
distributed,  using  sample  standard  deviations  can  entail  a  loss  in  efficiency 
with  respect  to  using  residuals  if  m  is  small.  For  substantial  replication  (m 
1  10),  using  sample  standard  deviations  produces  a  slight  edge  in  efficiency 
with  respect  to  weighted  absolute  residuals  for  T(x)  =  x. 

The  second  column  of  Table  3  shows  that  for  the  logarithm  method,  using 
sample  standard  deviations  surpasses  using  residuals  in  terms  of  efficiency 
except  when  m  =  2  and  is  more  than  twice  as  efficient  for  large  m.  In  its  raw 
form,  log  | r ^  |  is  very  unstable  because,  at  least  occasionally,  | r ^  |  ~  0, 
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producing  a  wild  "outlier"  in  the  regression.  The  effect  of  using  sample 
standard  deviations  is  to  decrease  the  possibility  of  such  inliers:  the  sample 
standard  deviations  will  be  likely  more  uniform,  especially  as  m  increases. 
The  implication  is  that  the  logarithm  method  should  not  be  based  on  residuals 
unless  remedial  measures  are  taken.  The  suggestion  to  trim  a  few  of  the 
smallest  absolute  residuals  before  using  this  method  is  clearly  supported  by 
the  theory;  presumably,  such  triiming  would  reduce  or  negate  the  theoretical 
superiority  of  using  sample  standard  deviations. 

Table  4  contains  the  asymptotic  relative  efficiencies  of  weighted  squared 
sample  standard  deviations  and  logarithms  of  these  to  weighted  squared 
residuals  under  normality  of  the  errors.  The  first  column  is  the  efficiency  of 
Raab’s  method  to  pseudo- like 11 hood,  and  the  second  column  is  the  efficiency  of 
the  Rodbard  and  Frazier  method  to  pseudo- likelihood.  The  results  of  the  table 
imply  that  using  the  Raab  and  Rodbard  and  Frazier  methods,  which  are  popular  in 
the  analysis  of  radioimmunoassay  data,  can  entail  a  loss  of  efficiency  when 
compared  to  methods  based  on  weighted  squared  residuals.  Davidian  (1986)  has 
shown  that  the  Rodbard  and  Frazier  estimator  can  have  a  slight  edge  in 


efficiency  over  the  weighted  squared  residuals  methods  for  some  highly 
contaminated  normal  distributions.  From  (5.7),  the  squared  residual  methods 
will  be  more  efficient  than  Raab’s  method  in  the  limit.  Also  note  that  the 
entries  for  T(x)  =  x  and  log  x  in  Table  3  for  m  =  00  are  the  reciprocals  of  the 
first  row  of  Table  2  and  that  the  entries  for  last  row  of  Table  4  are  1.0; 
thus  if  both  N  and  m  grow  large  all  the  methods  yield  the  same  results. 

Table  4  also  addresses  the  open  question  as  to  whether  Raab’s  method  is 
asymptotically  more  efficient  than  the  Rodbard  and  Frazier  method  for  normally 
distributed  data.  The  answer  is  a  general  yes,  thus  agreeing  with  the 
Monte-Carlo  evidence  available  when  the  variance  is  a  power  of  the  mean.  The 
results  of  this  section  suggest  that  in  the  case  of  assay  data  containing  pairs 


mmm 


for  which  only  is  observed,  an  estimator  for  0  combining  estimation  based 
on  residuals  for  the  observations  for  which  xi  is  known  and  on  standard 
deviations  otherwise  in  an  appropriately  weighted  fashion  would  offer  some 
improvement  over  the  methods  currently  employed;  see  Carroll,  Davidian  and 
Smith  (1986). 

6.  DISCUSSION 

In  Section  4  we  constructed  a  general  theory  of  regression-type  estimation 
for  0  in  the  heteroscedastic  model  (1.1).  This  theory  includes  as  special 
cases  common  methods  described  in  Section  3  and  allows  for  the  regression  to  be 
based  on  absolute  residuals  from  the  current  regression  fit  as  well  as  sample 
standard  deviations  in  the  event  of  replication  at  each  design  point.  Under 
various  restrictions  such  as  symmetry  or  small  a.  when  the  variance  function  g 
does  not  depend  on  /3,  we  showed  in  Sections  4  and  5  that  we  can  draw  general 
conclusions  about  this  class  of  estimators  as  well  as  make  comparisons  among 
the  various  methods. 

When  employing  methods  based  on  residuals,  one  should  weight  the  residuals 
appropriately  and  iterate  the  process.  There  can  be  large  relative  differences 
among  the  methods  in  terms  of  efficiency.  Under  symmetry  of  the  errors, 
squared  residuals  are  preferable  for  approximately  normally  distributed  data, 
but  this  preference  is  tenuous,  these  can  be  highly  non-robust  under  only 
slight  departures  from  normality;  methods  based  on  logarithms  or  the  absolute 
residuals  themselves  exhibit  relatively  more  robust  behavior.  For  the  small 
amount  of  replication  found  in  practice,  using  sample  standard  deviations 
rather  than  residuals  can  entail  a  loss  in  efficiency  if  estimation  is  based  on 
the  squares  of  these  quantities  or  the  quantities  themselves.  For  the 
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logarithm  method  based  on  residuals,  trimming  the  smallest  few  absolute 


residuals  is  essential,  since  for  normal  data  using  sample  standard  deviations 


is  almost  always  more  efficient  than  using  residuals,  even  for  a  small  number 


of  replicates.  Popular  methods  applications  such  as  radioimmunoassay  based  on 


sample  means  and  sample  standard  deviations  can  be  less  efficient  than  methods 


based  on  weighted  squared  residuals.  In  some  instances,  the  precision  with 


which  we  can  estimate  0  depends  on  the  relative  range  of  values  of  the  mean 


responses,  not  thein  actual  values,  so  that  immediate  implications  for  design 


are  suggested. 


Efficient  variance  function  estimation  in  heteroscedastic  regression 


analysis  is  an  important  problem  in  its  own  right.  There  are  important 


differences  in  estimators  for  variance  when  it  is  modeled  parametrically. 
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APPENDIX  A.  PROOFS  OF  MAJOR  RESULTS 

We  now  present  sketches  of  the  proofs  of  the  theorems  of  Section  4.  Our 
exposition  is  brief  and  nonrigorous  as  our  goal  is  to  provide  general  insights. 
In  what  follows,  we  assume  that 


(A-  1 ) 


A 


under  sufficient  regularity  conditions  it  is  possible  to  prove  (A.l).  Such  a 

proof  would  be  long,  detailed  and  essentially  noninformative;  see  Carroll  and 

1/2 

Ruppert  (1982a)  for  a  proof  of  N  consistency  in  a  special  case. 

Sketch  of  proof  of  Theorem  4.1-  From  (4.1),  a  Taylor  series,  the  fact  that  E  [ 
T(di(/3)}  ]  =  Mj  and  laws  of  large  numbers,  we  have 

(A. 2)  0  =  N‘1/2^=1  (H1/Vi)[T{di(^)>  -  +  op(l) 


By  the  arguments  of  Ruppert  and  Carroll  (1980)  or  Carroll  and  Ruppert  (1982a), 
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(A.3) 


N"1/2^=l  -  T{di(P)}] 


=  N"1/2^J=1  (H/Vj)  f{d1(P)}{d1(^)  -  d^P)}  +  o  (l) 


=  B3.N  Nl/2^  -  +  V1}- 


Applying  this  result  to  (A. 2)  along  with  a  Taylor  series  in  Mj  gives 


o  =  n-1/2^=1  c,  ♦  (B2iN  *  b3>n)  N1/2(Pm  -  P) 


TJ  -  T) 


-  B,  „  N  .  +  o 

i.N  e  .  0  P 


+  o  (1). 


which  is  (4.2). 


Theorem  4.2  follows  by  a  similar  argument;  in  this  case  the  representation 


(A.3)  is  unnecessary. 


Sketch  of  proof  of  Theorem  4.3’.  We  consider  Theorem  4.2;  the  proof  for  Theorem 


4.1  is  similar.  Recall  here  that  (3.4)  holds.  In  the  following,  all 


derivatives  are  with  respect  to  the  mean  p^  and  the  definitions  of  and  Mi 


are  as  in  (4.4) . 


Assumption  (iv)  implies  that  | -  p^  |  0  so  that  a  Taylor 


series  in  77,  0  and  gives 


(*.4)  B,  „  NI/2  r  ”  -  N-1/22^ml  C,  -N-I/2^1(H1H,/Vi)(V1.  -  4,) 


VVW.vV.V.V  V  •*.  V.  / 


+  N"1/2^=i  <(H1/V1)  -  (^1/V1)}(Y1.-  Mi)  +  op(l). 


—  —  -1/2  —  — 

Since  Yi#  -  Pj  =  a  g(Mi>zi>0)  ei#~  XN  gfPj.Zj.O)  £i#.  where  is  the  mean 

of  the  errors  at  Xj ,  we  can  write  the  last  two  terms  on  the  right-hand  side  of 

(A. 4)  as 


(A. 5) 


XN 


“4,  v 


(q 


i.l 


+  qi.2ci) 


for  constants  (qi  j).  By  assumption  (v),  since  ei#  has  mean  zero.  (A. 5) 

converges  in  probability  to  zero  if  =  0,  which  holds  under  the 

assumption  of  symmetry.  Thus.  (A. 5)  converges  to  zero  which  from  (A.  4) 
completes  the  proof.  Note  that  if  we  drop  the  assumption  of  symmetry,  from 
(A. 5)  the  asymptotic  normal  distribution  of  N  (0  -  0)  will  have  mean 
P-Hm  (  X  B-|n  IT1^  (  I,  Ct„1>2  )}. 


APPENDIX  B.  CHARACTERIZATION  OF  RESTRICTED  MAXIMUM  LIKELIHOOD 

A 

Let  be  a  generalized  least  squares  estimator  for  0.  Assume  first  that 
g  does  not  depend  on  0.  Let  the  prior  distribution  for  the  parameters  tt(0,0.o) 
be  proportional  to  a  The  marginal  posterior  for  0  is  hard  to  compute  in 

closed  form  for  nonlinear  regression.  Following  Box  and  Hill  (1974)  and  Beal 
and  Sheiner  (1986),  we  have  the  linear  approximation 

f(xr0)  x  f(xi.pj  +  f p(xi.0w)t(0-0w). 

Replacing  f(x^,0)  by  Its  linear  expansion,  the  marginal  posterior  for  0  is 


proportional  to 


t 


/(Q)  {Det  SG(0)}“ 


^(0)  =  (N-pf^r2  /  g2(ziiM.0). 


SG(0)  =  N-^f  .(x1.Ji,)t  /  g2(zi-?i|.0). 


and  where  Det  A  =  determinant  of  A.  If  the  variances  depend  on  p,  we  extend 


the  Bayesian  arguments  by  replacing  g.(0)  by  g(z. ,P  ,0). 


Let  H  be  the  hat  matrix  H  evaluated  at  (3*  and  let  h^  =  h^JP^.O).  From 


(3.1),  pseudo- likelihood  solves  in  ( 0 , cr) 


(B.2)  2^=1  [^/{aVCZi-^.O)}] 


y0(Zi.PH.0) 


Vzi>PH-0) 


Since  H  is  idempotent.  the  left  hand  side  of  (B.2)  has  approximate  expectation 


(B.3) 


1  -  p/N 


ve(zi,Pif.6)  (1-  hn) 


To  modify  pseudo- likelihood  to  account  for  loss  of  degrees  of  freedom,  equate 


the  left  hand  side  of  (B.2)  to  (B.3).  From  matrix  computations  as  in  Nel 


(1980),  this  can  be  shown  to  be  equivalent  to  restricted  maximum  likelihood. 


.  /  s .  <  s  s,  S'  S'  S'.-  S'  s.  .  r  ....... 

■s?s?f~*ys^sys^s^jf's'sys:S'Ss;syf'S'S'’S'..sS'y-s':S'S'.\s^sS'.-.-s'ss'.-.-S'.-; 


r.  ** 


IWI IW  \JV  JW'JWUV  Wl  \ju  uw  www-iwwiwwnmiiwi 


Table  3 


Asymptotic  relative  efficiency  of  regression  methods  based  on  a  function  T 
of  sample  standard  deviations  relative  to  using  regression  methods  based  on 
a  function  T  of  absolute  residuals  under  normality  for  T(x)  (weighted 
methods) . 
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Table  4 


Asymptotic  relative  efficiency  of  regression  methods  based  on  a  function  T 
of  sample  standard  deviations  relative  to  regression  methods  based  on 
weighted  squared  residuals  under  normal  errors. 


